Clindex: Clustering for Similarity Queries in High-Dimensional Spaces

نویسندگان

Chen Li

Edward Chang

Hector Garcia-Molina

James Ze Wang

Gio Wiederhold

چکیده

In this paper we present a clustering and indexing paradigm (called Clindex) for highdimensional search spaces. The scheme is designed for approximate searches, where one wishes to nd many of the data points near a target point, but where one can tolerate missing a few near points. For such searches, our scheme can nd near points with high recall in very few IOs and performs signi cantly better than other approaches. Our scheme is based on nding clusters, and then building a simple but e cient index for them. We analyze the tradeo s involved in clustering and building such an index structure, and present experimental results based on a 30,000 image database.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Clustering for Approximate Similarity Search in High-Dimensional Spaces

In this paper we present a clustering and indexing paradigm (called Clindex) for high-dimensional search spaces. The scheme is designed for approximate similarity searches, where one wishes to find many of the data points near a target point, but where one can tolerate missing a few near points. For such searches, our scheme can find near points with high recall in very few IOs and perform sign...

متن کامل

DAHC-tree: An Effective Index for Approximate Search in High-Dimensional Metric Spaces

Similarity search in high-dimensional metric spaces is a key operation in many applications, such as multimedia databases, image retrieval, object recognition, and others. The high dimensionality of the data requires special index structures to facilitate the search. A problem regarding the creation of suitable index structures for highdimensional data is the relationship between the geometry o...

متن کامل

CSVD: Clustering and Singular Value Decomposition for Approximate Similarity Search in High-Dimensional Spaces

High-dimensionality indexing of feature spaces is critical for many data-intensive applications such as content-based retrieval of images or video from multimedia databases and similarity retrieval of patterns in data mining. Unfortunately, even with the aid of the commonly-used indexing schemes, the performance of nearest neighbor (NN) queries (required for similarity search) deteriorates rapi...

متن کامل

CoFD : An Algorithm for Non-distance Based Clustering in High Dimensional Spaces

The clustering problem, which aims at identifying the distribution of patterns and intrinsic correlations in large data sets by partitioning the data points into similarity clusters, has been widely studied. Traditional clustering algorithms use distance functions to measure similarity and are not suitable for high dimensional spaces. In this paper, we propose CoFD algorithm, which is a non-dis...

متن کامل

Using the Distance Distribution for Approximate Similarity Queries in High-Dimensional Metric Spaces

We investigate the problem of approximate similarity (nearest neighbor) search in high-dimensional metric spaces, and describe how the distance distribution of the query object can be exploited so as to provide probabilistic guarantees on the quality of the result. This leads to a new paradigm for similarity search, called PAC-NN (probably approximately correct nearest neighbor) queries, aiming...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 1999

Clindex: Clustering for Similarity Queries in High-Dimensional Spaces

نویسندگان

چکیده

منابع مشابه

Clustering for Approximate Similarity Search in High-Dimensional Spaces

DAHC-tree: An Effective Index for Approximate Search in High-Dimensional Metric Spaces

CSVD: Clustering and Singular Value Decomposition for Approximate Similarity Search in High-Dimensional Spaces

CoFD : An Algorithm for Non-distance Based Clustering in High Dimensional Spaces

Using the Distance Distribution for Approximate Similarity Queries in High-Dimensional Metric Spaces

عنوان ژورنال:

اشتراک گذاری